智能论文笔记

Neural Fields for Robotic Object Manipulation from a Single Image

Valts Blukis , Taeyeop Lee , Jonathan Tremblay , Bowen Wen , In So Kweon , Kuk-Jin Yoon , Dieter Fox , Stan Birchfield

分类：机器人 | 人工智能 | 计算机视觉 | 机器学习

2022-10-21

We present a unified and compact representation for object rendering, 3D reconstruction, and grasp pose prediction that can be inferred from a single image within a few seconds. We achieve this by leveraging recent advances in the Neural Radiance Field (NeRF) literature that learn category-level priors and fine-tune on novel objects with minimal data and time. Our insight is that we can learn a compact shape representation and extract meaningful additional information from it, such as grasping poses. We believe this to be the first work to retrieve grasping poses directly from a NeRF-based representation using a single viewpoint (RGB-only), rather than going through a secondary network and/or representation. When compared to prior art, our method is two to three orders of magnitude smaller while achieving comparable performance at view reconstruction and grasping. Accompanying our method, we also propose a new dataset of rendered shoes for training a sim-2-real NeRF method with grasping poses for different widths of grippers.

translated by 谷歌翻译

任务计划可能需要定义有关机器人需要采取行动的世界的无数领域知识。为了改善这项工作，可以使用大型语言模型（LLM）在任务计划期间为潜在的下一个操作评分，甚至直接生成动作序列，鉴于没有其他域信息的自然语言指令。但是，这样的方法要么需要列举所有可能的下一步评分，要么生成可能包含在当前机器人中给定机器人上不可能操作的自由形式文本。我们提出了一个程序化的LLM提示结构，该结构能够跨越位置环境，机器人功能和任务的计划生成功能。我们的关键见解是提示LLM具有环境中可用操作和对象的类似程序的规格，以及可以执行的示例程序。我们通过消融实验提出了有关迅速结构和生成约束的具体建议，证明了虚拟屋家庭任务中最先进的成功率，并将我们的方法部署在桌面任务的物理机器人组上。网站progprompt.github.io

translated by 谷歌翻译

自然语言提供可访问和富有富有态度的界面，以指定机器人代理的长期任务。但是，非专家可能会使用高级指令指定此类任务，其中通过多个抽象层摘要通过特定的机器人操作。我们建议将语言和机器人行动之间的这种差距延长长的执行视野是持久的表示。我们提出了一种持久的空间语义表示方法，并展示它是如何构建执行分层推理的代理，以有效执行长期任务。尽管完全避免了常用的逐步说明，我们评估了我们对阿尔弗雷德基准的方法并实现了最先进的结果。

translated by 谷歌翻译